Overview

Dataset statistics

Number of variables35
Number of observations146267
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory35.4 MiB
Average record size in memory253.8 B

Variable types

BOOL25
NUM9
CAT1

Reproduction

Analysis started2020-06-11 23:14:25.531464
Analysis finished2020-06-11 23:14:59.794808
Duration34.26 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

patientunitstayid has unique values Unique

Variables

patientunitstayid
Real number (ℝ≥0)

UNIQUE

Distinct count146267
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1761820.897673433
Minimum141168
Maximum3353254
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB

Quantile statistics

Minimum141168
5-th percentile228174.6
Q1967380.5
median1682535
Q32736056
95-th percentile3208001.1
Maximum3353254
Range3212086
Interquartile range (IQR)1768675.5

Descriptive statistics

Standard deviation982682.3096
Coefficient of variation (CV)0.557765157
Kurtosis-1.29633439
Mean1761820.898
Median Absolute Deviation (MAD)897845
Skewness0.03249551484
Sum2.576962572e+11
Variance9.656645217e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
28724181< 0.1%
 
1733101< 0.1%
 
16458331< 0.1%
 
32391751< 0.1%
 
6156861< 0.1%
 
32309791< 0.1%
 
27747791< 0.1%
 
30122031< 0.1%
 
17523171< 0.1%
 
3412601< 0.1%
 
Other values (146257)146257> 99.9%
 
ValueCountFrequency (%) 
1411681< 0.1%
 
1411941< 0.1%
 
1411971< 0.1%
 
1412031< 0.1%
 
1412081< 0.1%
 
ValueCountFrequency (%) 
33532541< 0.1%
 
33532511< 0.1%
 
33532351< 0.1%
 
33532161< 0.1%
 
33532011< 0.1%
 

label
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
145815
1
 
452
ValueCountFrequency (%) 
014581599.7%
 
14520.3%
 

age
Real number (ℝ≥0)

Distinct count72
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean63.22381671874039
Minimum19
Maximum91
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB

Quantile statistics

Minimum19
5-th percentile30
Q153
median65
Q376
95-th percentile88
Maximum91
Range72
Interquartile range (IQR)23

Descriptive statistics

Standard deviation16.90112089
Coefficient of variation (CV)0.2673220595
Kurtosis-0.2856809446
Mean63.22381672
Median Absolute Deviation (MAD)12
Skewness-0.511601956
Sum9247558
Variance285.6478875
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
9150773.5%
 
6737062.5%
 
6835482.4%
 
7134712.4%
 
7234582.4%
 
6634222.3%
 
6533812.3%
 
6333112.3%
 
7032962.3%
 
6232162.2%
 
Other values (62)11038175.5%
 
ValueCountFrequency (%) 
195370.4%
 
205170.4%
 
216010.4%
 
225950.4%
 
236150.4%
 
ValueCountFrequency (%) 
9150773.5%
 
8913911.0%
 
8815871.1%
 
8718271.2%
 
8619951.4%
 

admissionweight
Real number (ℝ≥0)

Distinct count4196
Unique (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean84.0307176601694
Minimum40.1
Maximum396.9
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB

Quantile statistics

Minimum40.1
5-th percentile51
Q166.6
median80
Q396.8
95-th percentile129.8
Maximum396.9
Range356.8
Interquartile range (IQR)30.2

Descriptive statistics

Standard deviation25.52466374
Coefficient of variation (CV)0.3037539658
Kurtosis4.558820796
Mean84.03071766
Median Absolute Deviation (MAD)14.9
Skewness1.446893105
Sum12290920.98
Variance651.508459
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6816621.1%
 
81.614091.0%
 
63.513640.9%
 
90.712590.9%
 
77.111970.8%
 
72.69240.6%
 
599110.6%
 
758750.6%
 
74.88560.6%
 
708370.6%
 
Other values (4186)13497392.3%
 
ValueCountFrequency (%) 
40.119< 0.1%
 
40.212< 0.1%
 
40.325< 0.1%
 
40.375< 0.1%
 
40.425< 0.1%
 
ValueCountFrequency (%) 
396.91< 0.1%
 
362.81< 0.1%
 
334.21< 0.1%
 
313.41< 0.1%
 
3131< 0.1%
 

admissionheight
Real number (ℝ≥0)

Distinct count578
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean169.67951622717356
Minimum101.6
Maximum227.3
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB

Quantile statistics

Minimum101.6
5-th percentile152.4
Q1162.56
median170.1
Q3177.8
95-th percentile187.9
Maximum227.3
Range125.7
Interquartile range (IQR)15.24

Descriptive statistics

Standard deviation10.85334332
Coefficient of variation (CV)0.06396378046
Kurtosis0.2086350256
Mean169.6795162
Median Absolute Deviation (MAD)7.7
Skewness-0.1583090499
Sum24818513.8
Variance117.7950613
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
167.685975.9%
 
177.885505.8%
 
16084815.8%
 
172.778845.4%
 
165.177685.3%
 
170.267514.6%
 
162.665444.5%
 
175.360214.1%
 
182.959794.1%
 
180.357273.9%
 
Other values (568)7396550.6%
 
ValueCountFrequency (%) 
101.63< 0.1%
 
101.71< 0.1%
 
102.91< 0.1%
 
1031< 0.1%
 
1042< 0.1%
 
ValueCountFrequency (%) 
227.32< 0.1%
 
218.41< 0.1%
 
2181< 0.1%
 
213.42< 0.1%
 
2131< 0.1%
 

bmi
Real number (ℝ≥0)

Distinct count45143
Unique (%)30.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.129880421693137
Minimum11.543684925305568
Maximum145.40429527440193
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB

Quantile statistics

Minimum11.54368493
5-th percentile19.02522366
Q123.58983547
median27.52960509
Q332.8125
95-th percentile44.39340172
Maximum145.4042953
Range133.8606103
Interquartile range (IQR)9.222664532

Descriptive statistics

Standard deviation8.330620073
Coefficient of variation (CV)0.2859819523
Kurtosis6.699724974
Mean29.12988042
Median Absolute Deviation (MAD)4.458424967
Skewness1.760076924
Sum4260740.22
Variance69.3992308
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
24.208111410.1%
 
28.690873711290.1%
 
24.946795461260.1%
 
27.359331631240.1%
 
24.017767851240.1%
 
23.295904581240.1%
 
25.812296521210.1%
 
22.606102721200.1%
 
23.474145991170.1%
 
29.0497321160.1%
 
Other values (45133)14502599.2%
 
ValueCountFrequency (%) 
11.543684932< 0.1%
 
11.786957591< 0.1%
 
11.898818731< 0.1%
 
12.153621781< 0.1%
 
12.18310491< 0.1%
 
ValueCountFrequency (%) 
145.40429531< 0.1%
 
144.17289551< 0.1%
 
131.37231482< 0.1%
 
129.15679711< 0.1%
 
126.72967861< 0.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size142.8 KiB
0
79654
1
66613
ValueCountFrequency (%) 
07965454.5%
 
16661345.5%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size142.8 KiB
0
130148
1
 
16119
ValueCountFrequency (%) 
013014889.0%
 
11611911.0%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size142.8 KiB
0
143820
1
 
2447
ValueCountFrequency (%) 
014382098.3%
 
124471.7%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size142.8 KiB
1
113989
0
32278
ValueCountFrequency (%) 
111398977.9%
 
03227822.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size142.8 KiB
0
140733
1
 
5534
ValueCountFrequency (%) 
014073396.2%
 
155343.8%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size142.8 KiB
0
145118
1
 
1149
ValueCountFrequency (%) 
014511899.2%
 
111490.8%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size142.8 KiB
0
139238
1
 
7029
ValueCountFrequency (%) 
013923895.2%
 
170294.8%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size142.8 KiB
1
132943
0
 
13324
ValueCountFrequency (%) 
113294390.9%
 
0133249.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size142.8 KiB
0
139099
1
 
7168
ValueCountFrequency (%) 
013909995.1%
 
171684.9%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size142.8 KiB
0
140111
1
 
6156
ValueCountFrequency (%) 
014011195.8%
 
161564.2%
 

verbal
Real number (ℝ≥0)

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.038573294044453
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB

Quantile statistics

Minimum1
5-th percentile1
Q14
median5
Q35
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.52568242
Coefficient of variation (CV)0.3777775736
Kurtosis-0.08317229152
Mean4.038573294
Median Absolute Deviation (MAD)0
Skewness-1.278498631
Sum590710
Variance2.327706846
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
59355664.0%
 
12544917.4%
 
41895413.0%
 
350493.5%
 
232592.2%
 
ValueCountFrequency (%) 
12544917.4%
 
232592.2%
 
350493.5%
 
41895413.0%
 
59355664.0%
 
ValueCountFrequency (%) 
59355664.0%
 
41895413.0%
 
350493.5%
 
232592.2%
 
12544917.4%
 

motor
Real number (ℝ≥0)

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.513943678341663
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB

Quantile statistics

Minimum1
5-th percentile1
Q16
median6
Q36
95-th percentile6
Maximum6
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.217818093
Coefficient of variation (CV)0.22086154
Kurtosis7.419355098
Mean5.513943678
Median Absolute Deviation (MAD)0
Skewness-2.869075097
Sum806508
Variance1.483080906
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
611682979.9%
 
5127458.7%
 
176865.3%
 
476185.2%
 
38730.6%
 
25160.4%
 
ValueCountFrequency (%) 
176865.3%
 
25160.4%
 
38730.6%
 
476185.2%
 
5127458.7%
 
ValueCountFrequency (%) 
611682979.9%
 
5127458.7%
 
476185.2%
 
38730.6%
 
25160.4%
 

eyes
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
4
104794
3
22268
1
 
11855
2
 
7350
ValueCountFrequency (%) 
410479471.6%
 
32226815.2%
 
1118558.1%
 
273505.0%
 

Length

Max length1
Median length1
Mean length1
Min length1
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
143788
1
 
2479
ValueCountFrequency (%) 
014378898.3%
 
124791.7%
 

aids
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
146095
1
 
172
ValueCountFrequency (%) 
014609599.9%
 
11720.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
143835
1
 
2432
ValueCountFrequency (%) 
014383598.3%
 
124321.7%
 

lymphoma
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
145577
1
 
690
ValueCountFrequency (%) 
014557799.5%
 
16900.5%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
143194
1
 
3073
ValueCountFrequency (%) 
014319497.9%
 
130732.1%
 

leukemia
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
145162
1
 
1105
ValueCountFrequency (%) 
014516299.2%
 
111050.8%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
142183
1
 
4084
ValueCountFrequency (%) 
014218397.2%
 
140842.8%
 

cirrhosis
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
143461
1
 
2806
ValueCountFrequency (%) 
014346198.1%
 
128061.9%
 

activetx
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
1
85367
0
60900
ValueCountFrequency (%) 
18536758.4%
 
06090041.6%
 

ima
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
141724
1
 
4543
ValueCountFrequency (%) 
014172496.9%
 
145433.1%
 

midur
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
144879
1
 
1388
ValueCountFrequency (%) 
014487999.1%
 
113880.9%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
98391
1
47876
ValueCountFrequency (%) 
09839167.3%
 
14787632.7%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
109044
1
37223
ValueCountFrequency (%) 
010904474.6%
 
13722325.4%
 

diabetes
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
111042
1
35225
ValueCountFrequency (%) 
011104275.9%
 
13522524.1%
 

visitnumber
Real number (ℝ≥0)

Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.0626935672434656
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum8
Range7
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2840225017
Coefficient of variation (CV)0.2672666048
Kurtosis51.00714779
Mean1.062693567
Median Absolute Deviation (MAD)0
Skewness5.900270483
Sum155437
Variance0.08066878146
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
113832494.6%
 
269764.8%
 
37840.5%
 
41330.1%
 
532< 0.1%
 
611< 0.1%
 
75< 0.1%
 
82< 0.1%
 
ValueCountFrequency (%) 
113832494.6%
 
269764.8%
 
37840.5%
 
41330.1%
 
532< 0.1%
 
ValueCountFrequency (%) 
82< 0.1%
 
75< 0.1%
 
611< 0.1%
 
532< 0.1%
 
41330.1%
 

heartrate
Real number (ℝ≥0)

Distinct count118715
Unique (%)81.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean84.2589246210441
Minimum0.0
Maximum172.66666666666663
Zeros2
Zeros (%)< 0.1%
Memory size1.1 MiB

Quantile statistics

Minimum0
5-th percentile60.05188634
Q172.36933798
median83
Q394.89955996
95-th percentile112.9533373
Maximum172.6666667
Range172.6666667
Interquartile range (IQR)22.53022198

Descriptive statistics

Standard deviation16.34307436
Coefficient of variation (CV)0.193962532
Kurtosis0.08713487727
Mean84.25892462
Median Absolute Deviation (MAD)11.19736842
Skewness0.3805836678
Sum12324300.13
Variance267.0960795
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
8050< 0.1%
 
7049< 0.1%
 
7631< 0.1%
 
6928< 0.1%
 
9023< 0.1%
 
6723< 0.1%
 
6823< 0.1%
 
7823< 0.1%
 
8122< 0.1%
 
8622< 0.1%
 
Other values (118705)14597399.8%
 
ValueCountFrequency (%) 
02< 0.1%
 
1.5333333331< 0.1%
 
6.410256411< 0.1%
 
29.402985071< 0.1%
 
29.637992831< 0.1%
 
ValueCountFrequency (%) 
172.66666671< 0.1%
 
172.33333331< 0.1%
 
163.16666671< 0.1%
 
162.14655171< 0.1%
 
161.83333331< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

patientunitstayidlabelageadmissionweightadmissionheightbmigender_Femaleethnicity_African Americanethnicity_Asianethnicity_Caucasianethnicity_Hispanicethnicity_Native Americanethnicity_Other/Unknownunitstaytype_admitunitstaytype_readmitunitstaytype_transferverbalmotoreyesthrombolyticsaidshepaticfailurelymphomametastaticcancerleukemiaimmunosuppressioncirrhosisactivetximamiduroobventday1oobintubday1diabetesvisitnumberheartrate
014116807084.3152.436.2959061001000100564000000001000001125.052830
114119406873.9180.322.732803000100010046300000000000001186.860627
2141197071102.1162.638.617545000100010056400000000000000197.307692
314120307770.2160.027.421875100100010013100000000100101191.543554
414120802595.3172.731.952749100100010056300000000000000177.817460
514122708282.2185.423.9140070001000100463000000001001001108.975610
614122909189.8160.035.078125100100010056400000000100100166.225806
714123308161.7165.122.6355481001000100564000000001001101106.710801
814124405992.3180.328.3929320001000100564000000000000001103.253472
914126004369.9172.723.436486110000010056400000000000000172.793651

Last rows

patientunitstayidlabelageadmissionweightadmissionheightbmigender_Femaleethnicity_African Americanethnicity_Asianethnicity_Caucasianethnicity_Hispanicethnicity_Native Americanethnicity_Other/Unknownunitstaytype_admitunitstaytype_readmitunitstaytype_transferverbalmotoreyesthrombolyticsaidshepaticfailurelymphomametastaticcancerleukemiaimmunosuppressioncirrhosisactivetximamiduroobventday1oobintubday1diabetesvisitnumberheartrate
146257335319405163.05170.221.765366100000110011100000100100111180.391608
146258335319606671.50157.528.823381100100001056400000000100000293.156780
146259335319706671.50157.528.823381100100010036400000000110110198.879433
146260335319806671.50157.528.823381100100001014200000000100110477.864769
146261335320006671.50157.528.823381100100001056400000000100110597.790210
146262335320106671.50157.528.823381100100001056300000000100110379.658451
146263335321605055.40165.120.324301110000010015100000000100110169.608541
146264335323505090.00175.329.287256000100010056400000000000000187.089623
1462653353251073102.00177.832.265371010000010011100000000100111171.550877
146266335325408183.90185.424.408579000100010056400000000100000176.178571